103 research outputs found

    Computational barriers in minimax submatrix detection

    Get PDF
    This paper studies the minimax detection of a small submatrix of elevated mean in a large matrix contaminated by additive Gaussian noise. To investigate the tradeoff between statistical performance and computational cost from a complexity-theoretic perspective, we consider a sequence of discretized models which are asymptotically equivalent to the Gaussian model. Under the hypothesis that the planted clique detection problem cannot be solved in randomized polynomial time when the clique size is of smaller order than the square root of the graph size, the following phase transition phenomenon is established: when the size of the large matrix pβ†’βˆžp\to\infty, if the submatrix size k=Θ(pΞ±)k=\Theta(p^{\alpha}) for any α∈(0,2/3)\alpha\in(0,{2}/{3}), computational complexity constraints can incur a severe penalty on the statistical performance in the sense that any randomized polynomial-time test is minimax suboptimal by a polynomial factor in pp; if k=Θ(pΞ±)k=\Theta(p^{\alpha}) for any α∈(2/3,1)\alpha\in({2}/{3},1), minimax optimal detection can be attained within constant factors in linear time. Using Schatten norm loss as a representative example, we show that the hardness of attaining the minimax estimation rate can crucially depend on the loss function. Implications on the hardness of support recovery are also obtained.Comment: Published at http://dx.doi.org/10.1214/14-AOS1300 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"

    Get PDF
    Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szab\'o, van der Vaart and van Zanten [arXiv:1310.4489v5].Comment: Published at http://dx.doi.org/10.1214/15-AOS1270D in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparse principal component analysis and iterative thresholding

    Get PDF
    Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1097 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices

    Get PDF
    This paper considers sparse spiked covariance matrix models in the high-dimensional setting and studies the minimax estimation of the covariance matrix and the principal subspace as well as the minimax rank detection. The optimal rate of convergence for estimating the spiked covariance matrix under the spectral norm is established, which requires significantly different techniques from those for estimating other structured covariance matrices such as bandable or sparse covariance matrices. We also establish the minimax rate under the spectral norm for estimating the principal subspace, the primary object of interest in principal component analysis. In addition, the optimal rate for the rank detection boundary is obtained. This result also resolves the gap in a recent paper by Berthet and Rigollet [1] where the special case of rank one is considered

    Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices

    Full text link
    We study minimax rates for denoising simultaneously sparse and low rank matrices in high dimensions. We show that an iterative thresholding algorithm achieves (near) optimal rates adaptively under mild conditions for a large class of loss functions. Numerical experiments on synthetic datasets also demonstrate the competitive performance of the proposed method

    Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

    Get PDF
    This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal x∈Rpx \in \mathbb{R}^p from noisy quadratic measurements yj=(ajβ€²x)2+Ο΅jy_j = (a_j' x )^2 + \epsilon_j, j=1,…,mj=1, \ldots, m, with independent sub-exponential noise Ο΅j\epsilon_j. The goals are to understand the effect of the sparsity of xx on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the aja_j's are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of xx.Comment: 28 pages, 4 figure

    Sparse PCA: Optimal rates and adaptive estimation

    Get PDF
    Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1178 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore